Two-stage method to remove population- and individual-level outliers from longitudinal data in a primary care database.
نویسندگان
چکیده
PURPOSE: In the UK, primary care databases include repeated measurements of health indicators at the individual level. As these databases encompass a large population, some individuals have extreme values, but some values may also be recorded incorrectly. The challenge for researchers is to distinguish between records that are due to incorrect recording and those which represent true but extreme values. This study evaluated different methods to identify outliers. METHODS: Ten percent of practices were selected at random to evaluate the recording of 513,367 height measurements. Population-level outliers were identified using boundaries defined using Health Survey for England data. Individual-level outliers were identified by fitting a random-effects model with subject-specific slopes for height measurements adjusted for age and sex. Any height measurements with a patient-level standardised residual more extreme than ±10 were identified as an outlier and excluded. The model was subsequently refitted twice after removing outliers at each stage. This method was compared with existing methods of removing outliers. RESULTS: Most outliers were identified at the population level using the boundaries defined using Health Survey for England (1550 of 1643). Once these were removed from the database, fitting the random-effects model to the remaining data successfully identified only 75 further outliers. This method was more efficient at identifying true outliers compared with existing methods. CONCLUSIONS: We propose a new, two-stage approach in identifying outliers in longitudinal data and show that it can successfully identify outliers at both population and individual level. Copyright © 2011 John Wiley & Sons, Ltd.
منابع مشابه
Identification of outliers types in multivariate time series using genetic algorithm
Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...
متن کاملExploring Health System Responsiveness in Ambulatory Care and Disease Management and its Relation to Other Dimensions of Health System Performance (RAC) – Study Design and Methodology
Background The responsiveness of a health system is considered to be an intrinsic goal of health systems and an essential aspect in performance assessment. Numerous studies have analysed health system responsiveness and related concepts, especially across different countries and health systems. However, fewer studies have applied the concept for the evaluation of specific healthcare delivery s...
متن کاملSemi-parametric Quantile Regression for Analysing Continuous Longitudinal Responses
Recently, quantile regression (QR) models are often applied for longitudinal data analysis. When the distribution of responses seems to be skew and asymmetric due to outliers and heavy-tails, QR models may work suitably. In this paper, a semi-parametric quantile regression model is developed for analysing continuous longitudinal responses. The error term's distribution is assumed to be Asymmetr...
متن کاملروشهای تعیین دادههای پرت در مطالعات پزشکی
Background: An outlier is an observation that lies an abnormal distance from other values in a random sample from a population. Outliers sometimes deal with to abnormality in obtained results from collected data and information. known outlier data by researchers, physicians and other persons that work in medical fields and sciences is important and they must control data before getting result a...
متن کاملبررسی رفتار درجستجوی درمان ساکنان شهر تهران و عوامل موثر بر آن
Background and Aim: Factors determining the health care-seeking behaviors of an individual are social, cultural, and economic (treatment costs). Utilization of a health care system by a person will, on the whole, depend mainly on the socio-economic and demographic factors, cultural beliefs and practices, gender discrimination and women's status, the economic and political systems, environment, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Pharmacoepidemiology and drug safety
دوره 21 7 شماره
صفحات -
تاریخ انتشار 2012